Construction of a genetic map

require(ASMap)
require(RColorBrewer)
require(dplyr)
require(ggplot2)
require(reshape2)
require(ggparallel)
require(wgaim)
setwd("/home/exserta/Documents/master_project_noelle/projects/BioNano-exserta/GBS_LepMAP3")
dopdf = TRUE # to create pdf document

SNP and Genotype calling overview

Heterozygous sites have been replaced by NA. The number of heterozygous sites was 37564, the number of axillaris 143739 and the number or exserta 179837. The heterozygosity therefore was 0.1040151.

Genotype frequencies among individuals : number or markers per individual which are of a given genotype.

indvs <- read.table("F7-K/LM3_F7_K.INDIVIDUAL.genoSUMMARY.csv", header=T, sep=",")
i_nmar <- unique(indvs$tot)

print(as.character(indvs[which(indvs$NA. > 1000), "individual"])) # are removed in cleaned marker set
## [1] "RIL_35" "RIL_44" "RIL_67" "RIL_88"
ggplot(data = indvs, aes(AX, EX)) + geom_point(alpha = 0.2)

if(dopdf == T){ggsave("figures/filtering1.pdf")}
## Saving 7 x 5 in image
indvs <- melt(indvs)
## Using individual as id variables
indvs <- indvs %>% filter(variable != "tot")
#check for exessive missings
indvs[which(indvs$NA. > 1000),] # 0, nothing to remove
ggplot(indvs, aes(factor(variable), value / i_nmar)) +
  geom_jitter(height = 0, width = 0.45, size = 0.2, colour = "grey50") + geom_violin(aes(fill = variable), draw_quantiles = 0.5) + labs(x = "Genotypes", y="Frequency", title = "Genotype frequencies among individuals", subtitle = paste("Median : AX = ", median(indvs[indvs$variable == "AX", "value"] / i_nmar), " EX = ", median(indvs[indvs$variable == "EX", "value"] / i_nmar), "Missing = ", median(indvs[indvs$variable == "NA.", "value"] / i_nmar)))

if(dopdf == T){ggsave("figures/filtering2.pdf")}
## Saving 7 x 5 in image

Genotype frequencies among markers : number of individuals per marker which are of a given genotype.

mrks <- read.table("F7-K/LM3_F7_K.MARKER.genoSUMMARY.csv", header=T, sep=",")
mrks <- melt(mrks)
## Using marker as id variables
mrks <- mrks %>% filter(variable != "tot")

#check for exessive missings
mrks[which(mrks$NA. > 100),] # 0, nothing to remove
ggplot(data = mrks, aes(value, fill = variable)) +
  geom_density(alpha = 0.2)

if(dopdf == T){ggsave("figures/filtering3.pdf")}
## Saving 7 x 5 in image
ggplot(mrks, aes(factor(variable), value/195)) + 
  geom_jitter(height = 0, width = 0.45, size = 0.1, colour = "grey50") + geom_violin(aes(fill = variable), draw_quantiles = 0.5) +
  labs(x = "Genotypes", y="Frequency", title = "Genotype frequencies among markers", subtitle = paste("Median AX = ", median(mrks[mrks$variable == "AX", "value"] / 195), "EX = ", median(mrks[mrks$variable == "EX", "value"] / 195), "missing = ", median(mrks[mrks$variable == "NA.", "value"] / 195)))

if(dopdf == T){ggsave("figures/filtering4.pdf")}
## Saving 7 x 5 in image
geno <- read.table("F7-K/F7-K.geno.csv", header=T, sep=",")
nhet <- sum(as.vector(apply(geno == "HET", 2, sum)), na.rm=T) / (sum(as.vector(apply(geno == "AX", 2, sum)), na.rm=T) + sum(as.vector(apply(geno == "EX", 2, sum)), na.rm=T))

Heterozygosity [%] : 0.1160902

Map Construction

read in genotype data

In the input file, axillaris genotype is encoded by “AX”, heterozygous as “HET” and exserta as “EX”. Missing data is encoded as “-”.

input file read.cross f7_K
AX AA 1
HET AB
EX BB 2
- NA
f7_K <- read.cross(format = "csv", file = "F7-K/LM3_F7_K.markers.clean.csv", F.gen = 7, genotypes = c("AX", "HET", "EX"), na.strings = "-")#, crosstype = "riself")
##  --Read the following data:
##   191  individuals
##   1852  markers
##   1  phenotypes
## Warning in summary.cross(cross): Strange genotype pattern.
## Warning in max(maxsp, na.rm = TRUE): no non-missing arguments to max;
## returning -Inf
##  --Cross type: bcsft
f7_K <- convert2riself(f7_K)

The population needs to be converted to riself (a selfing RIL population after many generations). This assumes heterozygosity to be 0. The heterozygotes were manually removed.

pulling markers

Pull out markers which are co.located. Why not also remove markers which show linkage disequilibrium? We expect and see LD. However, we don’t want to lose too many markers. Therefore we let it be.

f7_K <- pullCross(f7_K, type = "co.located")

Cluster markers to LGs and order within

Q : Why does this not reduce genetic distance? If the distance is inflated due to small errors within the chromosomes, ordering should reduce the amount of crossovers found between markers (at least a little bit). - It is not expected in any case, but can happen. So no reduction of genetic distance is not a sign for a low quality map, but normal.

mvest.bc is required, if not, the clustering into linkage groups is not performed as well. mvest.bc imputes missing markers before clustering into linkage groups.

#set bychr = FALSE to allow complete reconstruction of map 
map1 <- mstmap.cross(f7_K, bychr = F, dist.fun = "kosambi", trace = TRUE, detectBadData = F, p.value = 1e-09, mvest.bc = T, return.imputed = T)

# order markers within linkage groups
map1 <- mstmap.cross(map1, bychr = T, dist.fun = "kosambi", trace = TRUE, detectBadData = F, p.value = 1e-09, mvest.bc = F, return.imputed = T)
summary(map1)
## Warning in summary.cross(map1): Some markers at the same position on chr L.
## 1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
##     RI strains via selfing
## 
##     No. individuals:    191 
## 
##     No. phenotypes:     1 
##     Percent phenotyped: 100 
## 
##     No. chromosomes:    12 
##         Autosomes:      L.1 L.10 L.11 L.12 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 
## 
##     Total markers:      744 
##     No. markers:        182 3 1 1 82 93 79 144 75 70 4 10 
##     Percent genotyped:  89.8 
##     Genotypes (%):      AA:46.4  BB:53.6

Quality Control 1

before pushing back in markers.

The expected recombination rate is 1 per generation and chromosome. Plotting profileGen per chromosome therefore requires xo.lambda = 7. If it is plotted for all linkage groups, xo.lambda should be set to 49.

heatMap(map1, lmax=15)
## Warning in heatMap(map1, lmax = 15): Running est.rf.

for(i in paste("L.", seq(1,7), sep="")){
  profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 7, chr=i)
}

profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().

mean(countXO(map1))
## [1] 10.46073

Plot interpretation

heatMap: see clear linkage groups.

profileGen (per chromosome) : * no pink circles in the “per-chromosome-plots”. * Crossovers : expected amount of crossovers per chromosome ~ 7 (because 7 generations of RIL). Less is okay, because heterozygosity was removed artificially. Per heterozygous marker, a potential double-crossover was removed.

profileMark: * Seg Distortion : don’t use it. But a higher value shows regions with high segregation distortion, where linkage is not as expected, e.g. we would expect such behaviour at the speciation island. * Double Crossovers : 6 is very high, the amount of DCOs is expected to be lower after imputation. Around 2-3 would be OK.

wgaim to impute markers

“You can then use cross2int() in this package to perform a smart imputation on your linkage map. It does two things, it condenses the co-located markers into unique markers and this imputes most missing alleles. Then remaining missing values are then imputed using a probabilistic numerical flanking marker algorithm.” cross2int converts cross data to “interval” data and imputes missing markers.

# push back in markers
map1 <- pushCross(map1, type="co.located")
# order again
map1 <- mstmap(map1, bychr = T, dist.fun = "kosambi", trace = TRUE, detectBadData = F, p.value = 1e-09, mvest.bc = F, return.imputed = T)
# saveRDS(map1, "map1.rds")

Some markers are now on top of each other. We could use jittermap to move them away from each other, however that would introduce a manual distance of markers. todo do that?

create pdf of graphs (QC)

if(dopdf == T){
  pdf("figures/QC_map_before_imp.pdf", onefile=T, paper="a4r", width = 11)
  heatMap(map1, lmax=15)
  for(i in paste("L.", seq(1,7), sep="")){
    profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 25, chr=i)
  }
  profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
  dev.off()
}
## Warning in heatMap(map1, lmax = 15): Running est.rf.
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## png 
##   2

Impute with wgaim

map1 <- readRDS("map1.rds")
# impute
map1 <- cross2int(map1, id="Genotype", rem.mark=F) # rem.mark = F to not take out colocated markers from the map
## Warning in miss.q(el$theta, el$imputed.data): Line RIL_50 has missing
## values across the whole of a chromosome, .. These have been replaced by
## 0's.
# geno contains the full genetic map
map1$geno$L.1$map[1:10]
##  Peex113Ctg17958_624309  Peex113Ctg17958_624312  Peex113Ctg17958_866375 
##               0.0000000               0.9328071               0.9328071 
##  Peex113Ctg17958_866386 Peex113Ctg17961_1433683  Peex113Ctg17963_181442 
##               0.9328071               1.9957285               3.0786914 
##  Peex113Ctg17963_238591  Peex113Ctg17963_238690  Peex113Ctg17966_963040 
##               3.0786914               3.0786914               4.5777366 
##  Peex113Ctg18098_811695 
##               8.4575431
# imputed contains unique markers
map1$imputed.geno$L.1$map[1:4]
##  Peex113Ctg17958_624309  Peex113Ctg17958_624312 Peex113Ctg17961_1433683 
##               0.0000000               0.9328071               1.9957285 
##  Peex113Ctg17963_181442 
##               3.0786914
saveRDS(map1, "map1_imputed.rds")
write.cross(map1, format="csv") # saves map as "data.csv"

Plotting the map

plot.map(map1, main = "Genetic map, imputed")

# link.map(map1, chr="L.3")
# geno.image(map1, main = "Genetic map, imputed",alternate.chrid=T)
# knitr::kable(head(geno.table(map1)))

Keep in mind that Linkage group 8 (L.8) is (?) consisting of the markers which could not be associated to any other group. Possibly it contains markers from the chloroplast or mitochondrium DNA.

geno.table returns a table of the genetic map:

p value is from chi-square tests for mendelian segregation. Are the observed genotypes compatible with the expected ones? Formula : sum of \(\frac{(O - E)^2}{E}\) for all observation classes.

Markers with a high p-value are expected to be distorted. (correct?) todo : clarify

Quality Control 2

summaryMap(map1)

Heat map

Lower triangle : pairwise LOD scores, higher triangle : pairwise estimated RFs. Heat of lower triangle should match heat of upper triangle. Markers within linkage groups are consistent linkage. The linkage within groups is much higher than linkage between groups. A clear clustering was possible.

Good heat map shows that construction process was successful. No detail problems are shown.

heatMap(map1, lmax=15)
## Warning in heatMap(map1, lmax = 15): Running est.rf.

Check recombination rate

The recombination rate should be appropriate, this is one of the key quality characteristics.

Barley : each individual of population has a expected recombination rate of ~ 14 on a 200cM chromosome.

If there are Genotypes that exceed this expected recombination rate, they are shown in the graph below.

Calculation of xo.lambda, the expected recombination rate todo : go on here!

for(i in paste("L.", seq(1,7), sep="")){
  profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 7, chr=i)
}

profile mark

profile individual marker and interval statistics.

profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## Warning in summary.cross(cross): Invalid genotypes. 
##     Observed genotypes: 0 1 2

Crossovers

# with ABHgenotypeR, it was 13
mean(countXO(map1))
## [1] 10.35602

seg.dist : profile of the -log10 p-value. Is the result of a test of segregation distortion for each marker dxo : profile number of double crossovers occurring at each marker erf : Profile of recombination fractions for intervals lod : Profile of the LOD score.

create pdf of graphs

if(dopdf == T){
  pdf("figures/QC_map_imputed.pdf", onefile=T, paper="a4r", width = 11)
  heatMap(map1, lmax=15)
  for(i in paste("L.", seq(1,7), sep="")){
    profileGen(map1, stat.type = c("xo", "dxo", "miss"), xo.lambda = 25, chr=i)
  }
  profileMark(map1, stat.type = c("seg.dist", "dxo", "erf", "lod"), id = "Genotype", layout = c(1, 4), type = "l")
  plot.map(map1, main = "Genetic map, imputed")
  dev.off()
}
## Warning in heatMap(map1, lmax = 15): Running est.rf.
## Warning in summary.cross(cross): Some markers at the same position on chr
## L.1,L.10,L.2,L.3,L.4,L.5,L.6,L.7,L.8,L.9; use jittermap().
## Warning in summary.cross(cross): Invalid genotypes. 
##     Observed genotypes: 0 1 2
## png 
##   2

Association of Linkage groups with Chromosomes

Blast genes in Pax pseudomolecules (PaxChr)

Print map to see contig names and distances of markers to each other.

map1$geno$L.1$map
##  Peex113Ctg17958_624309  Peex113Ctg17958_624312  Peex113Ctg17958_866375 
##               0.0000000               0.9328071               0.9328071 
##  Peex113Ctg17958_866386 Peex113Ctg17961_1433683  Peex113Ctg17963_181442 
##               0.9328071               1.9957285               3.0786914 
##  Peex113Ctg17963_238591  Peex113Ctg17963_238690  Peex113Ctg17966_963040 
##               3.0786914               3.0786914               4.5777366 
##  Peex113Ctg18098_811695  Peex113Ctg18098_811726   Peex113Ctg18118_24362 
##               8.4575431              15.9865907              17.0188319 
##  Peex113Ctg18109_468871 Peex113Ctg18108_1243493 Peex113Ctg18108_1241761 
##              17.0188319              17.8404601              18.6596011 
##  Peex113Ctg18118_817351  Peex113Ctg18118_817355   Peex113Ctg18121_15280 
##              20.0165236              20.0165236              21.5379445 
##   Peex113Ctg18121_15283  Peex113Ctg18118_822185   Peex113Ctg18121_15295 
##              21.5379445              21.5379445              24.8457017 
##   Peex113Ctg18121_15297   Peex113Ctg18121_15303   Peex113Ctg18121_15363 
##              27.7756803              28.5117360              30.5587146 
##   Peex113Ctg18121_15394   Peex113Ctg18121_15420   Peex113Ctg18121_64757 
##              31.7098985              31.7098985              36.2782543 
##   Peex113Ctg08628_39755   Peex113Ctg08984_47666   Peex113Ctg08984_66709 
##              41.3692243              41.3692243              42.7843368 
##   Peex113Ctg00860_16910   Peex113Ctg00860_16930  Peex113Ctg00860_116302 
##              43.9568844              43.9568844              43.9568844 
##  Peex113Ctg17695_690561  Peex113Ctg17696_353203  Peex113Ctg17699_257949 
##              48.4521765              49.1914107              51.0831375 
## Peex113Ctg17903_1178727 Peex113Ctg17903_1178732 Peex113Ctg18472_1119111 
##              53.0107604              53.0107604              54.5418393 
## Peex113Ctg18472_1119132 Peex113Ctg18472_1119119 Peex113Ctg18472_1119147 
##              54.5418393              54.5418393              55.2652219 
## Peex113Ctg18472_1369036 Peex113Ctg17674_1729764 Peex113Ctg17674_1259959 
##              55.9732444              59.6882672              59.6882672 
## Peex113Ctg17674_1259956 Peex113Ctg17674_1259958   Peex113Ctg18651_10760 
##              59.6882672              59.6882672              60.3323127 
##  Peex113Ctg00004_119009   Peex113Ctg18639_25620  Peex113Ctg18664_402032 
##              60.3323127              61.5638813              63.7177793 
##  Peex113Ctg18664_402051  Peex113Ctg18665_125436   Peex113Ctg18666_49208 
##              63.7177793              64.4969610              64.4969610 
##   Peex113Ctg18666_49221   Peex113Ctg18666_49245   Peex113Ctg07912_36464 
##              64.4969610              64.4969610              65.2798569 
##  Peex113Ctg17727_241766  Peex113Ctg17727_241946  Peex113Ctg17727_241978 
##              66.0638840              66.0638840              66.0638840 
##  Peex113Ctg17727_241996  Peex113Ctg17727_242028  Peex113Ctg17727_242072 
##              66.0638840              66.0638840              66.0638840 
##  Peex113Ctg17727_288558  Peex113Ctg17727_288561  Peex113Ctg17727_288627 
##              66.0638840              66.0638840              66.0638840 
##  Peex113Ctg17727_297659  Peex113Ctg17729_212450  Peex113Ctg17729_212750 
##              66.0638840              66.0638840              66.0638840 
##  Peex113Ctg17729_226721  Peex113Ctg17727_241810  Peex113Ctg17727_241932 
##              66.0638840              66.0638840              66.0638840 
##  Peex113Ctg17727_241944  Peex113Ctg17715_249680  Peex113Ctg17714_834831 
##              66.0638840              66.8527750              66.8527750 
##  Peex113Ctg17715_287486  Peex113Ctg17715_472331  Peex113Ctg17714_769330 
##              66.8527750              66.8527750              68.1678862 
##  Peex113Ctg17712_994913  Peex113Ctg17714_703125  Peex113Ctg17714_757638 
##              68.1678862              68.1678862              68.1678862 
##  Peex113Ctg17714_769423  Peex113Ctg17712_994839  Peex113Ctg17712_994886 
##              68.1678862              70.2337100              70.2337100 
##  Peex113Ctg17729_235620  Peex113Ctg17729_235638  Peex113Ctg17609_324476 
##              71.0806389              71.0806389              72.4665555 
##  Peex113Ctg17609_223199  Peex113Ctg17609_223193   Peex113Ctg17609_86903 
##              73.9869299              73.9869299              75.3995740 
##  Peex113Ctg17609_223163  Peex113Ctg17609_324398  Peex113Ctg17609_223172 
##              75.3995740              75.3995740              75.3995740 
##  Peex113Ctg17609_324401  Peex113Ctg18238_909909  Peex113Ctg18237_331163 
##              75.3995740              76.6100367              76.6100367 
##  Peex113Ctg18237_313478  Peex113Ctg17550_127275  Peex113Ctg18233_449126 
##              76.6100367              79.9398654              79.9398654 
##  Peex113Ctg17550_131027  Peex113Ctg18233_542515  Peex113Ctg18233_542564 
##              79.9398654              79.9398654              79.9398654 
##  Peex113Ctg18237_252537  Peex113Ctg18237_252567  Peex113Ctg17550_130992 
##              79.9398654              79.9398654              79.9398654 
##  Peex113Ctg17550_131042  Peex113Ctg17550_131122  Peex113Ctg17550_131125 
##              79.9398654              79.9398654              79.9398654 
##  Peex113Ctg17550_149925  Peex113Ctg17550_149933  Peex113Ctg18700_129727 
##              79.9398654              79.9398654              81.3365510 
##   Peex113Ctg00050_84557   Peex113Ctg10388_80893  Peex113Ctg18684_432949 
##              81.3365510              81.3365510              81.3365510 
##  Peex113Ctg18686_175956  Peex113Ctg18694_197814  Peex113Ctg18694_520285 
##              81.3365510              81.3365510              81.3365510 
##  Peex113Ctg18700_129738   Peex113Ctg00050_84548  Peex113Ctg17550_152697 
##              81.3365510              81.3365510              81.3365510 
##  Peex113Ctg17550_152698  Peex113Ctg00043_151704  Peex113Ctg11741_208907 
##              81.3365510              82.3010152              84.1142089 
##  Peex113Ctg11685_238514   Peex113Ctg18054_87464  Peex113Ctg17841_497544 
##              86.5734234              87.7784443              89.4481909 
##   Peex113Ctg10531_23725  Peex113Ctg18472_716349  Peex113Ctg17769_321336 
##              91.6782688              94.5740174              97.6705579 
##   Peex113Ctg17774_16638  Peex113Ctg17760_706973  Peex113Ctg17763_269681 
##              97.6705579              97.6705579              97.6705579 
##  Peex113Ctg17766_583688   Peex113Ctg17774_16848  Peex113Ctg17769_321296 
##              97.6705579              97.6705579              99.7237082 
##   Peex113Ctg11453_37447   Peex113Ctg01106_37059  Peex113Ctg17769_321179 
##             101.9720907             101.9720907             101.9720907 
##   Peex113Ctg17768_36371   Peex113Ctg11453_37414   Peex113Ctg17776_34542 
##             101.9720907             104.1822244             104.1822244 
##  Peex113Ctg17757_486006  Peex113Ctg05921_557310  Peex113Ctg17757_486009 
##             104.1822244             104.1822244             104.1822244 
##   Peex113Ctg17774_16661  Peex113Ctg17775_574112 Peex113Ctg18492_1154177 
##             104.1822244             104.1822244             104.1822244 
##  Peex113Ctg18652_191625  Peex113Ctg18652_191646   Peex113Ctg00757_62711 
##             104.1822244             104.1822244             106.0525818 
##   Peex113Ctg00790_20622   Peex113Ctg00790_21453   Peex113Ctg18368_14411 
##             106.0525818             106.0525818             107.8189561 
##  Peex113Ctg18369_691023  Peex113Ctg18378_238368  Peex113Ctg18378_238447 
##             107.8189561             107.8189561             107.8189561 
##  Peex113Ctg18378_238430  Peex113Ctg18378_238432  Peex113Ctg18369_691395 
##             107.8189561             107.8189561             107.8189561 
##  Peex113Ctg18364_770226   Peex113Ctg18363_61632   Peex113Ctg18363_61640 
##             107.8189561             109.7641784             109.7641784 
##  Peex113Ctg18364_770220  Peex113Ctg01630_256927  Peex113Ctg16000_353851 
##             109.7641784             112.0558822             112.0558822 
##   Peex113Ctg01616_47771   Peex113Ctg01641_44685   Peex113Ctg01641_44726 
##             112.0558822             112.0558822             112.0558822 
##  Peex113Ctg17792_168019  Peex113Ctg10921_452655  Peex113Ctg17792_517927 
##             113.5612760             113.5612760             113.5612760 
##  Peex113Ctg17792_517951  Peex113Ctg17976_133197  Peex113Ctg17976_133201 
##             113.5612760             115.0606838             115.0606838 
##  Peex113Ctg17976_133204  Peex113Ctg17976_133208  Peex113Ctg17976_394839 
##             115.0606838             115.0606838             115.0606838 
##  Peex113Ctg17976_394945   Peex113Ctg18564_75305   Peex113Ctg18564_75325 
##             115.0606838             115.0606838             115.0606838 
##  Peex113Ctg18564_101517  Peex113Ctg18564_101524  Peex113Ctg18569_280080 
##             115.0606838             115.0606838             115.0606838 
##   Peex113Ctg18575_20244  Peex113Ctg18560_448542  Peex113Ctg18560_448535 
##             115.0606838             115.0606838             115.0606838 
##  Peex113Ctg17898_879722   Peex113Ctg16000_75976  Peex113Ctg17976_394902 
##             115.0606838             115.0606838             115.0606838 
##  Peex113Ctg17976_394915  Peex113Ctg17976_470726  Peex113Ctg17977_595651 
##             115.0606838             115.0606838             115.0606838 
##   Peex113Ctg18564_75351  Peex113Ctg11805_116254   Peex113Ctg18564_75228 
##             115.0606838             116.6490725             116.6490725 
##  Peex113Ctg13981_129677   Peex113Ctg01361_24163   Peex113Ctg01386_32712 
##             118.2568005             118.2568005             118.2568005 
##  Peex113Ctg14844_219591  Peex113Ctg14844_219578  Peex113Ctg14844_219724 
##             118.2568005             118.2568005             118.2568005 
##  Peex113Ctg18551_162160  Peex113Ctg18551_162184  Peex113Ctg18554_204404 
##             118.2568005             118.2568005             118.2568005 
##   Peex113Ctg00675_49295   Peex113Ctg00133_44649  Peex113Ctg17878_842192 
##             119.9594869             119.9594869             121.5846463 
##  Peex113Ctg17878_842332  Peex113Ctg18684_315511  Peex113Ctg14844_397310 
##             121.5846463             121.5846463             121.5846463 
##  Peex113Ctg17995_401824  Peex113Ctg17878_842145   Peex113Ctg07096_63136 
##             122.8473697             122.8473697             122.8473697 
##   Peex113Ctg07222_44677   Peex113Ctg14844_72922   Peex113Ctg14665_15250 
##             122.8473697             124.8971498             124.8971498 
##  Peex113Ctg14844_219569  Peex113Ctg18039_701514  Peex113Ctg18039_701496 
##             124.8971498             124.8971498             124.8971498 
##  Peex113Ctg18039_885833  Peex113Ctg18032_165069  Peex113Ctg18033_188220 
##             124.8971498             126.1531670             126.1531670 
##  Peex113Ctg18033_188227  Peex113Ctg18039_701446  Peex113Ctg09060_154413 
##             126.1531670             126.1531670             126.1531670 
##  Peex113Ctg18039_997060  Peex113Ctg09060_154345   Peex113Ctg14665_15228 
##             126.1531670             126.1531670             126.1531670 
##  Peex113Ctg14569_155929   Peex113Ctg14603_71136   Peex113Ctg14665_15184 
##             126.1531670             126.1531670             126.1531670 
##  Peex113Ctg18048_486597  Peex113Ctg18039_701325  Peex113Ctg18048_486567 
##             127.9686933             127.9686933             127.9686933 
##  Peex113Ctg18048_486591  Peex113Ctg18048_486592  Peex113Ctg18048_486596 
##             127.9686933             127.9686933             127.9686933 
##  Peex113Ctg00724_110996   Peex113Ctg00732_42016  Peex113Ctg18704_324352 
##             130.3289409             130.3289409             130.3289409 
##  Peex113Ctg18321_355103  Peex113Ctg18704_324453  Peex113Ctg18704_324482 
##             130.3289409             130.3289409             132.0070703 
##  Peex113Ctg18704_328307  Peex113Ctg17980_234395  Peex113Ctg17980_256247 
##             132.0070703             133.6705732             133.6705732 
##  Peex113Ctg18048_297027  Peex113Ctg18048_486663  Peex113Ctg18048_486691 
##             133.6705732             133.6705732             133.6705732 
##  Peex113Ctg18048_486711 Peex113Ctg18472_1369045   Peex113Ctg00675_49286 
##             133.6705732             135.9585454             135.9585454 
##  Peex113Ctg18241_430210  Peex113Ctg10921_365598  Peex113Ctg17917_182271 
##             135.9585454             135.9585454             135.9585454 
##  Peex113Ctg17917_182284  Peex113Ctg17917_182999  Peex113Ctg18244_266269 
##             135.9585454             135.9585454             135.9585454 
##  Peex113Ctg18474_415683 Peex113Ctg18472_1405321 Peex113Ctg18472_1405386 
##             135.9585454             135.9585454             135.9585454 
##  Peex113Ctg00825_221647  Peex113Ctg10921_452643  Peex113Ctg18241_430417 
##             135.9585454             135.9585454             135.9585454 
##  Peex113Ctg00825_221646  Peex113Ctg14969_121398   Peex113Ctg09482_37421 
##             138.2263470             140.0745656             142.1132943 
##  Peex113Ctg17849_345861  Peex113Ctg17849_590492  Peex113Ctg17851_317860 
##             142.1132943             142.1132943             142.1132943 
##  Peex113Ctg17851_318051  Peex113Ctg17851_488084  Peex113Ctg17851_488144 
##             142.1132943             142.1132943             142.1132943 
##  Peex113Ctg17849_345854  Peex113Ctg17899_673736   Peex113Ctg18308_61078 
##             144.1680766             146.0939891             146.0939891 
##   Peex113Ctg18308_60876  Peex113Ctg18321_531358  Peex113Ctg17795_207334 
##             146.0939891             146.0939891             146.0939891 
##  Peex113Ctg17782_378627  Peex113Ctg18307_162364  Peex113Ctg18307_162308 
##             146.0939891             146.0939891             146.0939891 
##  Peex113Ctg18293_356552   Peex113Ctg02378_57283   Peex113Ctg18077_95330 
##             147.9902997             147.9902997             147.9902997 
##  Peex113Ctg18077_213781  Peex113Ctg18077_644251  Peex113Ctg17801_106287 
##             147.9902997             147.9902997             147.9902997 
##  Peex113Ctg17801_119184   Peex113Ctg17562_47266  Peex113Ctg17796_213674 
##             147.9902997             147.9902997             147.9902997 
##  Peex113Ctg17796_423919  Peex113Ctg18491_144871  Peex113Ctg18175_206905 
##             147.9902997             147.9902997             147.9902997 
##  Peex113Ctg17786_395689  Peex113Ctg18300_112759  Peex113Ctg18524_345734 
##             147.9902997             147.9902997             147.9902997 
##  Peex113Ctg17555_144155  Peex113Ctg17558_384569  Peex113Ctg18290_473054 
##             147.9902997             147.9902997             147.9902997 
##  Peex113Ctg18524_383463  Peex113Ctg18524_345838  Peex113Ctg17782_386057 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg18000_209744 Peex113Ctg18177_1021727  Peex113Ctg17990_411539 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg17992_306374  Peex113Ctg17543_339206  Peex113Ctg18535_553968 
##             149.5565888             149.5565888             149.5565888 
## Peex113Ctg17994_1126260   Peex113Ctg18128_18953  Peex113Ctg18189_741852 
##             149.5565888             149.5565888             149.5565888 
##   Peex113Ctg07717_90189   Peex113Ctg07717_90377  Peex113Ctg17779_287139 
##             149.5565888             149.5565888             149.5565888 
##   Peex113Ctg17786_52263   Peex113Ctg17786_74627  Peex113Ctg18010_293657 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg18192_141229  Peex113Ctg18472_798455  Peex113Ctg18663_230243 
##             149.5565888             149.5565888             149.5565888 
##   Peex113Ctg18297_45555  Peex113Ctg01095_110011  Peex113Ctg17884_164472 
##             149.5565888             149.5565888             149.5565888 
##   Peex113Ctg05516_90969  Peex113Ctg18297_442075  Peex113Ctg18298_244631 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg18300_112743   Peex113Ctg05516_91001  Peex113Ctg05595_156157 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg05596_329998  Peex113Ctg05695_346092   Peex113Ctg07964_27066 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg00799_190624   Peex113Ctg00994_37422   Peex113Ctg00994_37425 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg10947_182884  Peex113Ctg12981_135809   Peex113Ctg01272_54235 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg13854_123235  Peex113Ctg17545_482530  Peex113Ctg17550_642096 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg17550_642101  Peex113Ctg17550_642137  Peex113Ctg17664_834156 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg17805_232163  Peex113Ctg17861_512516  Peex113Ctg17861_512528 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg17861_512562  Peex113Ctg17861_512572  Peex113Ctg17899_673749 
##             149.5565888             149.5565888             149.5565888 
## Peex113Ctg17903_1178462  Peex113Ctg17917_394608  Peex113Ctg17958_102587 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg17978_199737  Peex113Ctg18043_230175  Peex113Ctg18162_549416 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg18302_307025  Peex113Ctg18472_105306  Peex113Ctg18472_105334 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg18472_105371   Peex113Ctg18491_30670   Peex113Ctg18491_30686 
##             149.5565888             149.5565888             149.5565888 
## Peex113Ctg18492_1047407  Peex113Ctg18498_378930   Peex113Ctg18524_30502 
##             149.5565888             149.5565888             149.5565888 
##   Peex113Ctg18524_30534   Peex113Ctg18524_39246   Peex113Ctg18633_48928 
##             149.5565888             149.5565888             149.5565888 
##  Peex113Ctg18712_173300 Peex113Ctg18492_1047211   Peex113Ctg18633_49044 
##             149.5565888             149.5565888             149.5565888 
##   Peex113Ctg18128_15571   Peex113Ctg07899_24379   Peex113Ctg07899_24371 
##             149.5565888             151.1262275             151.1262275 
##   Peex113Ctg07899_24395   Peex113Ctg07899_40476  Peex113Ctg17543_100270 
##             151.1262275             151.1262275             151.1262275 
##   Peex113Ctg15500_18609   Peex113Ctg07888_66618   Peex113Ctg07899_24268 
##             151.1262275             151.1262275             151.1262275 
##   Peex113Ctg07899_24321   Peex113Ctg07899_24355   Peex113Ctg07899_24356 
##             151.1262275             151.1262275             151.1262275 
##  Peex113Ctg18661_346499  Peex113Ctg18325_192666  Peex113Ctg05596_218706 
##             151.1262275             151.1262275             151.1262275 
##  Peex113Ctg05596_218721  Peex113Ctg05596_329971  Peex113Ctg17884_167193 
##             151.1262275             151.1262275             151.1262275 
##  Peex113Ctg18684_315575  Peex113Ctg17796_213587  Peex113Ctg17786_371785 
##             151.1262275             151.1262275             152.9366904
# ctgnames <- as.vector(names(map1$imputed.geno$L.1$map))
# ctgs <- vector() ; for(i in ctgnames){ctgs[i] <- as.vector(strsplit(i, split = "_")[[1]])[1]} ; names(ctgs) <- NULL
ctgs <- as.data.frame(map1$imputed.geno$L.12$map)
ctgs <- as.data.frame(map1$geno$L.1$map)
ctgs[,"Ctg"] <- substr(rownames(ctgs), start=1, stop=15)
write.csv(ctgs, "ctgs.csv")

Find gene name on the P.exserta annotation file, search CDS in P.exserta MRNA file and blast against newest P.axillaris genome to check which chromosome it is. Table in Excel file. Write down position of marker on chromosome.

Focus on markers in the beginning and end of chromosome, there the data is more reliable.

# from where P.exserta annotation is stored
grep '^Peex113Ctg08628\speex113\sgene' P.EXSERTA.contigs.v1.1.3.annotation.v1.gff
# copy out name and search with vim in file
vim P.EXSERTA.contigs.v1.1.3.annotation.v1.MRNA.fasta # search with '/genename'
# copy out CDS and blast with SequenceServer
lgpax <- read.table("PaxChr.csv", header=T, sep = ",")
ggparallel(list('Linkage.group', 'AX.chromosome.best.match'), lgpax)

Compare with Optical mapping Super Scaffolds (OMss)

Are the same contigs together on a chromosome and on a super-scaffold?

ss <- read.table("OMss.csv", sep=",", header=T)
ss[,"OM"] <- as.numeric(as.factor(ss$Super.Scaffold.OM))
ggparallel(list('Linkage.Group','OM'), ss)

We do not see any overlap, this is great.

Compare with P.axillaris caps markers (caps)

Caps markers are markers which have been associated with a chromosome. Align caps to NGS genome and check which Contigs are listed there.

Made database of P.exserta genome for blastn with makeblastdb -in P.EXSERTA.contigs.v1.1.3.fasta -dbtype nucl -parse_seqids. Copy sequence and name of marker into file query.fasta and then blast with blastn -db P.EXSERTA.contigs.v1.1.3.fasta -query query.fasta -out results.out. Grep “Ctg” in the results file with grep Ctg results.out and search the Ctg names in the genetic map linkage groups.

#write.cross(map1, format="qtab") # and only use data_location.qtab for searching Ctg names.
caps <- read.table("overviews/caps.csv", sep="\t", header=F)
caps <- cbind(caps, as.numeric(substring(as.character(caps$V4), first=3)))
names(caps)[6] <- "LGs"
names(caps)[1] <- "Pax_Chr"
ggparallel(list('LGs','Pax_Chr'), caps, sub="chr7 absent")

Caps markers could not help to identify which LG is which chromosome. Either the Axillaris assembly or the genetic map is full of errors.

Outlook

Further process the file with ALLMAPS